Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(default-memory-limits): set default memory limits #4960

Merged

Conversation

ryanhristovski
Copy link
Contributor

Fixes #4882

  • sets default memory limits for shutdown manager to 32Mi and default containers to 512Mi (== to limit)
  • applied calculated max heap size envoy config to test data for new limit (512Mi)

@ryanhristovski ryanhristovski requested a review from a team as a code owner December 20, 2024 22:10
@zirain zirain force-pushed the feat-set-default-memory-limits branch from ceaaa52 to 4ac01d9 Compare December 24, 2024 12:14
@@ -215,6 +215,21 @@ spec:
typed_config:
"@type": type.googleapis.com/envoy.extensions.resource_monitors.downstream_connections.v3.DownstreamConnectionsConfig
max_active_downstream_connections: 50000
- name: "envoy.resource_monitors.fixed_heap"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these changes happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's set based on the memory limit by default, since memory limits weren't defined before they weren't being applied

Code here

// calculateMaxHeapSizeBytes calculates the maximum heap size in bytes as 80% of Envoy container memory limits.
// In case no limits are defined '0' is returned, which means no heap size limit is set.
func calculateMaxHeapSizeBytes(envoyResourceRequirements *corev1.ResourceRequirements) uint64 {
if envoyResourceRequirements == nil || envoyResourceRequirements.Limits == nil {
return 0
}
if memLimit, ok := envoyResourceRequirements.Limits[corev1.ResourceMemory]; ok {
memLimitBytes := memLimit.Value()
return uint64(float64(memLimitBytes) * 0.8)
}
return 0
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for clarify

@zirain zirain force-pushed the feat-set-default-memory-limits branch from 4ac01d9 to 7c5bdda Compare December 27, 2024 00:57
Copy link
Member

@cnvergence cnvergence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zirain zirain merged commit fddcf9b into envoyproxy:main Dec 28, 2024
1 check passed
@zirain
Copy link
Member

zirain commented Dec 28, 2024

looks like 32 cause OOM on master.

shutdown-manager:
    Container ID:    containerd://e43b953745ffdc5c2e86cf97c061d9e3560b7d800b15b625ca27d4ba315f9ece
    Image:           docker.io/envoyproxy/gateway-dev:latest
    Image ID:        docker.io/envoyproxy/gateway-dev@sha256:cb9737c83359588d3c10683a90ad835ae62696fb3ef8df20383425048aba1133
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Command:
      envoy-gateway
    Args:
      envoy
      shutdown-manager
      --ready-timeout=15s
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sat, 28 Dec 2024 14:15:42 +0800
      Finished:     Sat, 28 Dec 2024 14:15:42 +0800
    Ready:          False
    Restart Count:  5
    Limits:
      memory:  32Mi
    Requests:
      cpu:      10m
      memory:   32Mi
    Liveness:   http-get http://:19002/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:19002/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:    http-get http://:19002/healthz delay=0s timeout=1s period=10s #success=1 #failure=30
    Environment:
      ENVOY_GATEWAY_NAMESPACE:  envoy-gateway-system (v1:metadata.namespace)
      ENVOY_POD_NAME:           envoy-gateway-conformance-infra-all-namespaces-302def45-6brqfc7 (v1:metadata.name)
    Mounts:                     <none>

@ryanhristovski
Copy link
Contributor Author

@zirain hmm, it's generally not best practice to have memory request & limits not equal to each other - but given that this is a shutdown manager it could make sense to have separate values.

What do you think:

  1. set memory request to 32Mi and memory limit to something like 128Mi
    or
  2. set both request and limit to 128Mi

I didnt have time to fully benchmark it's resource use but it seems to sit idle around 25Mi and during shutdown (in my environment) it can hit around 70Mi.

I'm leaning #1 due to the nature of the shutdown manager - but this could cause resource issues for users in the future.

@zirain
Copy link
Member

zirain commented Dec 29, 2024

can you take a look at #4978?

@zirain
Copy link
Member

zirain commented Dec 30, 2024

@ryanhristovski this block CI, I didn't figure out why(test passed on my local).

let's revert it first, please resubmit later if you still want this, sorry for this.

zirain added a commit that referenced this pull request Dec 30, 2024
zirain added a commit that referenced this pull request Dec 30, 2024
zirain added a commit that referenced this pull request Dec 30, 2024
Revert "feat(default-memory-limits): set default memory limits  (#4960)"

This reverts commit fddcf9b.

Signed-off-by: zirain <[email protected]>
@sanposhiho
Copy link
Contributor

sanposhiho commented Jan 1, 2025

my 2p for the future:
Memory consumption depends on Go's GC cycle. In short, if CPU is busy for other stuff, Go runtime cannot take time for GC and hence memory consumption could go up. There's some env parameters (GOMEMLIMIT, GOGC) that can adjust the behaviour though, AFAIK, there's nothing to define a hard limit of memory.
CPU 10m looks too small, and also it doesn't have any limit. No CPU limit isn't a bad thing (rather I'm no-cpu-limit lover), but you have to put CPU limit when benchmarking to measure memory consumption for this kind of change. The reason is that, as I described first, there'd be a difference in memory consumption when your shutdown manager can use 10 CPU vs when your shutdown manager can use only 10m CPU. Given we only ensure that the container has 10m CPU, the default memory req/lim should be determined with the situation where the container truly has only 10m CPU available, which can be simulated with having 10m CPU limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define a default memory limit for shutdown-manager
4 participants